Review: Inferential Statistics

STA4173: Biostatistics
Spring 2025

Introduction

  • In the last lecture, we focused on describing data.

    • Continuous data: mean with standard deviation, median with interquartile range
    • Categorical data: count with percentage
  • Today, we will focus on drawing conclusions about two population means using data.

    • Confidence intervals
    • Hypothesis testing

Confidence Intervals

Point Estimate

The single value of a statistic that estimates the value of a parameter.

  • Examples of point estimates:

  • It is necessary to know how good our estimation is, or to quantify our uncertainty.

Confidence Interval

A range of plausible values for the parameter based on values observed in the sample.

\text{estimate} \pm \text{margin of error}

Level of Confidence

The probability that the interval will capture the true parameter value in repeated samples. i.e., the success rate for the method.

Confidence Intervals

Confidence Intervals

  • Because CIs are a range of values, we will use interval notation,
(lower bound, upper bound)
  • where
    • lower bound = point estimate – margin of error
    • upper bound = point estimate + margin of error
  • Make sure to state your confidence intervals in numeric order.
    • i.e., the lower bound must be the smaller number and the upper bound must be the larger number.

Confidence Interval for \mathbf{\boldsymbol \mu_1-\boldsymbol\mu_2}

(1-\alpha)100\% confidence interval for \mu_1-\mu_2

(\bar{x}_1 - \bar{x}_2) \pm t_{\alpha/2} \sqrt{\frac{s_1^2 }{n_1} + \frac{s_2^2}{n_2}} where t_{\alpha/2} has \text{min}(n_1-1, n_2-1) degrees of freedom.

  • To construct this interval, we require either:
    • the two populations to be normally distributed or
    • the sample sizes are sufficiently large (n_1 \ge 30 and n_2 \ge 30)
  • R syntax:
t.test(continuous_variable ~ grouping_variable,
       data = dataset_name,
       conf.level = confidence_level) 

Confidence Interval for \mathbf{\boldsymbol \mu_1-\boldsymbol\mu_2}

  • Recall the Palmer penguin data,
penguins <- palmerpenguins::penguins

Confidence Interval for \mathbf{\boldsymbol \mu_1-\boldsymbol\mu_2}

  • Let’s find the 95% confidence interval for the difference in average weight (body_mass_g) between male and female (sex) penguins.

  • Remember the R syntax:

t.test(continuous_variable ~ grouping_variable,
       data = dataset_name,
       conf.level = confidence_level) 
  • What is the continuous variable?

  • What is the grouping variable?

  • What is the dataset name?

  • What is the confidence level?

Confidence Interval for \mathbf{\boldsymbol \mu_1-\boldsymbol\mu_2}

  • Let’s find the 95% confidence interval for the difference in weight (body_mass_g) between male and female penguins.
t.test(body_mass_g ~ sex,
       data = penguins,
       conf.level = 0.95) 

    Welch Two Sample t-test

data:  body_mass_g by sex
t = -8.5545, df = 323.9, p-value = 4.794e-16
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
 -840.5783 -526.2453
sample estimates:
mean in group female   mean in group male 
            3862.273             4545.685 
  • Thus, the 95% confidence interval for \mu_F - \mu_M is (-840.6, -526.2).

Confidence Interval for \mathbf{\boldsymbol \mu_1-\boldsymbol\mu_2}

  • What about the 99% confidence interval for the difference in weight (body_mass_g) between male and female penguins?
t.test(body_mass_g ~ sex,
       data = penguins,
       conf.level = 0.99) 
  • What do you expect? Recall that the 95% CI was (-840.6, -526.2).

Confidence Interval for \mathbf{\boldsymbol \mu_1-\boldsymbol\mu_2}

  • What about the 99% confidence interval for the difference in weight (body_mass_g) between male and female penguins?
t.test(body_mass_g ~ sex,
       data = penguins,
       conf.level = 0.99) 

    Welch Two Sample t-test

data:  body_mass_g by sex
t = -8.5545, df = 323.9, p-value = 4.794e-16
alternative hypothesis: true difference in means between group female and group male is not equal to 0
99 percent confidence interval:
 -890.4112 -476.4124
sample estimates:
mean in group female   mean in group male 
            3862.273             4545.685 
  • Thus, the 99% confidence interval for \mu_F - \mu_M is (-890.4, -476.4).
    • Recall that the 95% CI was (-840.6, -526.2).

Hypothesis Testing

  • A friend of yours wants to play a simple coin-flipping game.
    • If the coin comes up heads, you win; if it comes up tails, your friend wins.
    • Suppose the outcome of five plays of the game is T, T, T, T, T.
    • Is your friend cheating?

Hypothesis Testing

  • A friend of yours wants to play a simple coin-flipping game.
    • If the coin comes up heads, you win; if it comes up tails, your friend wins.
    • Suppose the outcome of five plays of the game is T, T, T, T, T.
    • Is your friend cheating?
      • We know the probability of flipping a tail is 0.5.
      • We can compute the probability of flipping five tails in a row.

\begin{align*} P[\text{T, T, T, T, T}] &= 0.5 \times 0.5 \times 0.5 \times 0.5 \times 0.5 \\ &= 0.03125 \end{align*}

  • Is this probability low enough to believe your friend is cheating?

Hypothesis Testing

Hypothesis Testing

A procedure, based on sample evidence and probability, used to test statements regarding a characteristic of one or more populations.

  • Steps in hypothesis testing

    1. Make a statement regarding the nature of the population.

    2. Collect evidence (sample data) to test the statement.

    3. Analyze the data to assess the plausibility of the statement.

  • Note: if we have population parameters available, we do not need to perform a hypothesis test.

Hypothesis Testing: Hypotheses

Hypothesis

A statement regarding a characteristic of one or more populations.

  • In hypothesis testing, we have two hypotheses: the null and the alternative.

Null hypothesis, H_0

A statement to be tested.

  • This is a statement of no change, no effect, or no difference.
  • It is assumed true until evidence indicates otherwise.

Alternative hypothesis, H_1

A statement that we are trying to find evidence to support.

Hypothesis Testing: Hypotheses

  • One sample tests:
    • Two-tailed test
      • H_0: parameter = some value
      • H_1: parameter \ne some value
    • Left-tailed test
      • H_0: parameter \ge some value
      • H_1: parameter < some value
    • Right-tailed test
      • H_0: parameter \le some value
      • H_1: parameter > some value

Hypothesis Testing: Hypotheses

  • Two sample tests
    • Two-tailed test
      • H_0: parameter1 – parameter2 = 0
      • H_1: parameter1 – parameter2 \ne 0
    • Left-tailed test
      • H_0: parameter1 – parameter2 \ge 0
      • H_1: parameter1 – parameter2 < 0
    • Right-tailed test
      • H_0: parameter1 – parameter2 \le 0
      • H_1: parameter1 – parameter2 > 0

Hypothesis Testing: Errors

  • We use data to draw conclusions about hypotheses.
    • We will either reject or fail to reject the null (H_0).
  • If we draw the wrong conclusion, we make an error.
  • These can be classified as Type I (\alpha) or Type II (\beta) errors.
    • \alpha and \beta are probabilities (i.e., are between 0 and 1).

Hypothesis Testing: Errors

  • As stated earlier, Type I (\alpha) and Type II (\beta) errors are probabilities.
    • \alpha = \text{P}[\text{reject } H_0 \text{ when } H_0 \text{ is true}]
    • \beta = \text{P}[\text{fail to reject } H_0 \text{ when } H_1 \text{ is true}]
  • We also call \alpha the level of significance.
  • We should choose \alpha based on the level of error we are willing to withstand in the experiment.
    • The \alpha that is commonly used is \alpha=0.05.
    • Sometimes, smaller \alpha is used. e.g., clinical trial \to \alpha=0.01.
  • For a fixed sample size (n), \alpha and \beta are inversely related.

Hypothesis Testing: Test Statistics

  • After stating our hypotheses, we will construct a test statistic.
  • The choice of test statistic depends on:
    1. The hypotheses being tested.
    2. Assumptions made about the data.
  • The value of the test statistic depends on the sample data.
    • If we were to draw a different sample, we would find a different value for the test statistic.
  • We will use the test statistic on our way to drawing conclusions about the hypotheses.

Hypothesis Testing: p-Values

  • After constructing test statistics, we will find the corresponding p-value.
  • p-value: the probability of observing what we’ve observed or something more extreme, assuming the null hypothesis is true.
  • Finding a p-value depends on the distribution being used.
  • We will compare the p-value to \alpha in order to draw conclusions.
    • Reject H_0 if p < \alpha.

Hypothesis Testing: Conclusions and Interpretations

  • Once we’ve found the p-value, we can draw a conclusion.
    • If p < \alpha, we reject H_0.
      • There is sufficient evidence to suggest that H_1 is true.
    • If p \ge \alpha, we fail to reject H_0.
      • There is not sufficient evidence to suggest that H_1 is true.
  • Take aways:
    • We never “accept” the null.
    • We always interpret in terms of H_1.

** Two-Sample t-Test**

Hypothesis Test for Two Independent Means

Hypotheses

  • H_0: \mu_1-\mu_2 = \mu_0 | H_0: \mu_1-\mu_2 \le \mu_0 | H_0: \mu_1-\mu_2 \ge \mu_0
  • H_1: \mu_1-\mu_2 \ne \mu_0 | H_0: \mu_1 - \mu_2 > \mu_0 | H_1: \mu_1 - \mu_2 < \mu_0

Test Statistic t_0 = \frac{\bar{x}-\mu_0}{\frac{s}{\sqrt{n}}}

p-Value

  • p = 2 P[t \ge |t_0|] | p = P[t \ge |t_0|] | p = P[t \le |t_0|]

Rejection Region

  • Reject H_0 if p < \alpha.

Conclusion/Interpretation

  • [Reject or fail to reject] H_0.

  • There [is or is not] sufficient evidence to suggest [alternative hypothesis in words].

Two-Sample t-Test

  • R syntax:
t.test(continuous_variable ~ grouping_variable,
       data = dataset_name,
       mu = hypothesized_difference,
       alternative = alternative)
  • Important!!
    • We are estimating \mu_1 - \mu_2, but R is going to subtract in alphabetical or numeric order of the grouping variable.
      • e.g., if we have “Male” and “Female”, it will estimate \mu_{\text{Female}} - \mu_{\text{Male}}.
      • e.g., if we have “110” and “5”, it will estimate \mu_{5} - \mu_{110}.
      • In the case of two-tailed tets, this does not matter… but beware when doing a one-tailed test!

Two-Sample t-Test: Example

  • Consider the penguin data. Is there a significant difference in weight (body_mass_g) between male and female penguins? Test at the \alpha=0.05 level.

  • Remember the R syntax:

t.test(continuous_variable ~ grouping_variable,
       data = dataset_name,
       mu = hypothesized_difference,
       alternative = alternative)
  • What is the continuous variable?

  • What is the grouping variable?

  • What is the dataset name?

  • What is the hypothesized difference?

  • What is the alternative?

Two-Sample t-Test: Example

  • Consider the penguin data. Is there a significant difference in weight (body_mass_g) between male and female penguins? Test at the \alpha=0.05 level.

  • Remember the R syntax:

t.test(body_mass_g ~ sex,
       data = penguins,
       mu = 0,
       alternative = "two")

    Welch Two Sample t-test

data:  body_mass_g by sex
t = -8.5545, df = 323.9, p-value = 4.794e-16
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
 -840.5783 -526.2453
sample estimates:
mean in group female   mean in group male 
            3862.273             4545.685 
  • Is this a significant difference?

Two-Sample t-Test: Example

Hypotheses

  • H_0: \mu_1-\mu_2 = 0
  • H_1: \mu_1-\mu_2 \ne 0

Test Statistic and p-Value

  • t_0 = -8.55
  • p < 0.001

Rejection Region

  • Reject H_0 if p < \alpha; \alpha=0.05.

Conclusion/Interpretation

  • Reject H_0.

  • There is sufficient evidence to suggest that male and female penguins have different weights.

Hypothesis Testing: Practical vs. Statistical Significance

  • Hypothesis testing depends on sample size.
  • As the sample size increases, our p-values decrease necessarily.
  • As p-values decrease, we are more likely to reject the null hypothesis.
  • We must ask ourselves if the value we are testing against makes practical sense.
    • A new weight loss medication where the average amount of weight loss was 1 lb over 6 months.
    • A new weight loss medication where the average amount of weight lost was 15 lb over 6 months.
    • A new teaching method that raised final exam scores by 2 points.
    • A new teaching method that raised final exam scores by 15 points.

Independent vs. Dependent Data

Independent data

An individual selected for one sample does not dictate which individual is to be in a second sample.

In the data, there is not a way to link the individuals in the sample.

Dependent data

An individual selected to be in one sample is used to determine the individual in the second sample.

In the data, there is a way to link the individuals in the sample.

  • Examples:
    • Two sections of STA4173
    • Project grades in one section of STA4173
    • Male and female penguins
    • Prices online vs. in store at Target

Estimating \mathbf{\boldsymbol \mu_d = \boldsymbol \mu_1-\boldsymbol \mu_2}

  • We are now interested in comparing two dependent groups.

  • We assume that the two groups come from the same population and are going to examine the difference,

d = y_{i, 1} - y_{i, 2}

  • After drawing samples, we have the following,
    • \bar{d} estimates \mu_d,
    • s^2_d estimates \sigma^2_d, and
    • n is the sample size.

Confidence Interval for \mathbf{\boldsymbol \mu_d}

\mathbf{(1-\boldsymbol\alpha)100\%} confidence interval for \mathbf{\boldsymbol\mu_d}

\bar{d} \pm t_{\alpha/2} \frac{s_d}{\sqrt{n}}

  • where t_{\alpha/2} has n-1 degrees of freedom.
  • To construct this interval, we require either:
    • the differences to be normally distributed or
    • the sample size is sufficiently large (n \ge 30)
  • R syntax:
t.test(dataset_name$variable1_name,
       dataset_name$variable2_name, 
       paired = TRUE, 
       conf.level = confidence_level)

Confidence Interval for \mathbf{\boldsymbol \mu_d}: Example

  • Insurance adjusters are concerned about the high estimates they are receiving for auto repairs from garage I compared to garage II.
  • 15 cars were taken to both garages for separate estimates of repair costs.
library(tidyverse)
garage <- tibble(g1 = c(17.6, 20.2, 19.5, 11.3, 13.0, 
                        16.3, 15.3, 16.2, 12.2, 14.8,
                        21.3, 22.1, 16.9, 17.6, 18.4), 
                 g2 = c(17.3, 19.1, 18.4, 11.5, 12.7, 
                        15.8, 14.9, 15.3, 12.0, 14.2, 
                        21.0, 21.0, 16.1, 16.7, 17.5))
  • Construct the 95% confidence interval for the average difference between the two garages.

  • Remember the R syntax:

t.test(dataset_name$variable1_name,
       dataset_name$variable2_name, 
       paired = TRUE, 
       conf.level = confidence_level)

Confidence Interval for \mathbf{\boldsymbol \mu_d}: Example

t.test(garage$g1, 
       garage$g2, 
       paired = TRUE, 
       conf.level = 0.95)

    Paired t-test

data:  garage$g1 and garage$g2
t = 6.0234, df = 14, p-value = 3.126e-05
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 0.3949412 0.8317254
sample estimates:
mean difference 
      0.6133333 
  • The 95% CI for \mu_d, where d = x_{\text{I}} - x_{\text{II}} is (0.39, 0.83).

  • From the problem statement:

    • Insurance adjusters are concerned about the high estimates they are receiving for auto repairs from garage I compared to garage II.
  • Can we say that estimates from garage I are higher than those from garage II?

Paired t-Test

Hypothesis Test for Two Dependent Means

Hypotheses

  • H_0: \mu_d = \mu_0 | H_0: \mu_d \le \mu_0 | H_0: \mu_d \ge \mu_0
  • H_1: \mu_d \ne \mu_0 | H_0: \mu_d > \mu_0 | H_1: \mu_d < \mu_0

Test Statistic t_0 = \frac{\bar{d}-\mu_0}{\frac{s_d}{\sqrt{n}}}

P-Value

  • p = 2 P[t \ge |t_0|] | p = P[t \ge |t_0|] | p = P[t \le |t_0|]

Rejection Region

  • Reject H_0 if p < \alpha.

Conclusion/Interpretation

  • [Reject or fail to reject] H_0.

  • There [is or is not] sufficient evidence to suggest [alternative hypothesis in words].

Paired t-Test

  • R syntax:
t.test(dataset_name$variable1_name, 
       dataset_name$variable2_name, 
       paired = TRUE, 
       mu = hypothesized_difference,
       alternative = "alternative")
  • Important!!
    • We are estimating \mu_1 - \mu_2, but R is going to subtract in the order we state in the t.test() function.
      • e.g., if we want to examine \mu_d = \mu_{\text{freshman}} - \mu_{\text{sophomore}}:
t.test(students$freshman_gpa, 
       students$sophomore_gpa, 
       paired = TRUE, 
       mu = 0,
       alternative = "alternative")
t.test(students$sophomore_gpa, 
       students$freshman_gpa, 
       paired = TRUE, 
       mu = 0,
       alternative = "alternative")

Paired t-Test: Example

  • Let’s now formally determine if garage I’s estimates are higher than garage II’s. Test at the \alpha=0.05 level.

  • Recall the data,

garage <- tibble(g1 = c(17.6, 20.2, 19.5, 11.3, 13.0, 
                        16.3, 15.3, 16.2, 12.2, 14.8,
                        21.3, 22.1, 16.9, 17.6, 18.4), 
                 g2 = c(17.3, 19.1, 18.4, 11.5, 12.7, 
                        15.8, 14.9, 15.3, 12.0, 14.2, 
                        21.0, 21.0, 16.1, 16.7, 17.5))
  • and the R syntax:
t.test(dataset_name$variable1_name, 
       dataset_name$variable2_name, 
       paired = TRUE, 
       mu = hypothesized_difference,
       alternative = "alternative")

Paired t-Test: Example

t.test(garage$g1, 
       garage$g2,
       paired = TRUE,
       mu = 0,
       alternative = "greater")

    Paired t-test

data:  garage$g1 and garage$g2
t = 6.0234, df = 14, p-value = 1.563e-05
alternative hypothesis: true mean difference is greater than 0
95 percent confidence interval:
 0.4339886       Inf
sample estimates:
mean difference 
      0.6133333 
  • Are the estimates from garage I significantly higher than those from garage II?

Paired t-Test: Example

Hypotheses

  • H_0: \ \mu_{\text{I}} \le \mu_{\text{II}} OR \mu_{d} \le 0, where \mu_d = \mu_{\text{I}} - \mu_{\text{II}}
  • H_1: \ \mu_{\text{I}} > \mu_{\text{II}} OR \mu_{d} > 0

Test Statistic and p-Value

  • t_0 = 6.023
  • p < 0.001

Rejection Region

  • Reject H_0 if p < \alpha; \alpha = 0.05.

Conclusion/Interpretation

  • Reject H_0.
  • There is sufficient evidence to suggest the estimates at garage I are higher than that of garage II.

Wrap Up

  • Today we reviewed statistical inference.

    • Confidence intervals
    • Hypothesis testing
  • Get to know you quiz - complete with RStudio - due today.

  • Next meeting: how to conceptualize research questions.